智能论文笔记

Buildings Classification using Very High Resolution Satellite Imagery

Mohammad Dimassi , Abed Ellatif Samhat , Mohammad Zaraket , Jamal Haidar , Mustafa Shukor , Ali J. Ghandour

分类：计算机视觉

2021-11-29

使用卫星图像的建筑物分类对于诸如损害评估，资源分配和人口估算的若干应用而言变得越来越重要。在这项工作中，我们专注于建筑物损伤评估（BDA）和住宅和非住宅建筑的建筑物类型分类（BTC）。我们建议仅依赖于RGB卫星图像并遵循基于2级的深度学习的方法，其中使用语义分割模型提取建筑物的足迹，然后进行裁剪图像的分类。由于缺乏住宅/非住宅建筑物分类的适当数据集，我们介绍了一个新的高分辨率卫星图像数据集。我们进行广泛的实验，选择最佳的超参数，模型架构和培训范式，我们提出了一种新的转移基于学习的方法，以优于经典方法。最后，我们验证了两种应用中提出的方法，呈现出卓越的准确性和F1分数指标。

translated by 谷歌翻译

PIZZA: A new benchmark for complex end-to-end task-oriented parsing

Konstantine Arkoudas , Nicolas Guenon des Mesnards , Melanie Rubino , Sandesh Swamy , Saarthak Khanna , Weiqi Sun , Khan Haidar

分类：自然语言处理 | 机器学习

2022-12-01

Much recent work in task-oriented parsing has focused on finding a middle ground between flat slots and intents, which are inexpressive but easy to annotate, and powerful representations such as the lambda calculus, which are expressive but costly to annotate. This paper continues the exploration of task-oriented parsing by introducing a new dataset for parsing pizza and drink orders, whose semantics cannot be captured by flat slots and intents. We perform an extensive evaluation of deep-learning techniques for task-oriented parsing on this dataset, including different flavors of seq2seq systems and RNNGs. The dataset comes in two main versions, one in a recently introduced utterance-level hierarchical notation that we call TOP, and one whose targets are executable representations (EXR). We demonstrate empirically that training the parser to directly generate EXR notation not only solves the problem of entity resolution in one fell swoop and overcomes a number of expressive limitations of TOP notation, but also results in significantly greater parsing accuracy.

translated by 谷歌翻译

ON-DEMAND-FL: A Dynamic and Efficient Multi-Criteria Federated Learning Client Deployment Scheme

Mario Chahoud , Hani Sami , Azzam Mourad , Safa Otoum , Hadi Otrok , Jamal Bentahar , Mohsen Guizani

分类：人工智能 | 机器学习

2022-11-05

In this paper, we increase the availability and integration of devices in the learning process to enhance the convergence of federated learning (FL) models. To address the issue of having all the data in one location, federated learning, which maintains the ability to learn over decentralized data sets, combines privacy and technology. Until the model converges, the server combines the updated weights obtained from each dataset over a number of rounds. The majority of the literature suggested client selection techniques to accelerate convergence and boost accuracy. However, none of the existing proposals have focused on the flexibility to deploy and select clients as needed, wherever and whenever that may be. Due to the extremely dynamic surroundings, some devices are actually not available to serve as clients in FL, which affects the availability of data for learning and the applicability of the existing solution for client selection. In this paper, we address the aforementioned limitations by introducing an On-Demand-FL, a client deployment approach for FL, offering more volume and heterogeneity of data in the learning process. We make use of the containerization technology such as Docker to build efficient environments using IoT and mobile devices serving as volunteers. Furthermore, Kubernetes is used for orchestration. The Genetic algorithm (GA) is used to solve the multi-objective optimization problem due to its evolutionary strategy. The performed experiments using the Mobile Data Challenge (MDC) dataset and the Localfed framework illustrate the relevance of the proposed approach and the efficiency of the on-the-fly deployment of clients whenever and wherever needed with less discarded rounds and more available data.

translated by 谷歌翻译

BanglaSarc: A Dataset for Sarcasm Detection

Tasnim Sakib Apon , Ramisa Anan , Elizabeth Antora Modhu , Arjun Suter , Ifrit Jamal Sneha , MD. Golam Rabiul Alam

分类：自然语言处理 | 人工智能

2022-09-27

作为世界上口语最广泛的语言之一，孟加拉国的使用在社交媒体世界中也在增加。讽刺是一种积极的陈述或言论，其基本的负面动机在当今的社交媒体平台中广泛使用。在过去的许多年中，英语的讽刺检测有了显着改善，但是有关孟加拉讽刺检测的情况仍然没有改变。结果，仍然很难识别孟加拉国中的讽刺，缺乏高质量的数据是主要因素。本文提出了Banglasarc，该数据集是专门为孟加拉文本数据讽刺检测的数据集。该数据集包含5112条评论/状态和从各种在线社交平台（例如Facebook，YouTube）以及一些在线博客中收集的内容。由于孟加拉语中分类评论的数据收集数量有限，因此该数据集将有助于确定讽刺的研究，认识到人们的情绪，检测到各种类型的孟加拉语表达式和其他领域。该数据集可在https://www.kaggle.com/datasets/sakibapon/banglasarc上公开获得。

translated by 谷歌翻译

An End-to-End OCR Framework for Robust Arabic-Handwriting Recognition using a Novel Transformers-based Model and an Innovative 270 Million-Words Multi-Font Corpus of Classical Arabic with Diacritics

Aly Mostafa , Omar Mohamed , Ali Ashraf , Ahmed Elbehery , Salma Jamal , Anas Salah , Amr S. Ghoneim

分类：计算机视觉 | 自然语言处理 | 机器学习

2022-08-20

这项研究是有关阿拉伯历史文档的光学特征识别（OCR）的一系列研究的第二阶段，并研究了不同的建模程序如何与问题相互作用。第一项研究研究了变压器对我们定制的阿拉伯数据集的影响。首次研究的弊端之一是训练数据的规模，由于缺乏资源，我们的3000万张图像中仅15000张图像。另外，我们添加了一个图像增强层，时间和空间优化和后校正层，以帮助该模型预测正确的上下文。值得注意的是，我们提出了一种使用视觉变压器作为编码器的端到端文本识别方法，即BEIT和Vanilla Transformer作为解码器，消除了CNNs以进行特征提取并降低模型的复杂性。实验表明，我们的端到端模型优于卷积骨架。该模型的CER为4.46％。

translated by 谷歌翻译

AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

Saleh Soltan , Shankar Ananthakrishnan , Jack FitzGerald , Rahul Gupta , Wael Hamza , Haidar Khan , Charith Peris , Stephen Rawls , Andy Rosenbaum , Anna Rumshisky

分类：自然语言处理 | 机器学习

2022-08-02

在这项工作中，我们证明了多种语的大规模序列到序列（SEQ2SEQ）模型，该模型是通过Denoising和因果语言建模（CLM）任务的混合物进行训练的，比仅解码器模型更有效地进行了效率的学习者在各种任务上。特别是，我们培训了一个名为Alexa教师模型（Alexatm 20b）的200亿个参数多语言SEQ2SEQ模型，并表明它在1-Shot摘要任务上实现了最先进的（SOTA）性能，超过了更大的540B PALM DOPODER模型。 Alexatm 20b还可以在1-Shot Machine翻译中实现SOTA，尤其是对于低资源语言，几乎所有语言对（阿拉伯语，英语，法语，德语，德语，印地语，意大利语，日语，以及flores-101数据集上的泰卢固语）。我们还显示了零拍设置，AlexATM 20B在SuperGlue和SqueadV2数据集上的表现优于GPT3（175B），并在XNLI，XCOPA，PAWS-X和XWINOGRAD等多语言任务上提供SOTA性能。总体而言，我们的结果为SEQ2SEQ模型提供了一个令人信服的案例，作为大型语言模型（LLM）培训的仅解码器模型的强大替代方法。

translated by 谷歌翻译

Multi-Modal Unsupervised Pre-Training for Surgical Operating Room Workflow Analysis

Muhammad Abdullah Jamal , Omid Mohareri

分类：计算机视觉

2022-07-16

数据驱动的方法来协助手术室（OR）工作流程分析取决于耗时且收集昂贵的大型策划数据集。另一方面，我们看到最近从监督学习转变为可以从未标记数据集中学习表示的自我监督和/或无监督学习方法。在本文中，我们利用机器人手术中捕获的未标记数据，并提出了一种新颖的方法，以融合单个视频框架或图像的多模式数据。我们将多模式数据视为不同的观点，而不是同一图像或视频框架的不同图像或视频框架的不同增强（或“视图”）作为不同的观点，可以通过聚类以无监督的方式训练模型。我们将我们的方法与其他最新方法进行了比较，结果表明，我们的方法在手术视频活动识别和语义细分方面的表现出色。

translated by 谷歌翻译

Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems

Jack FitzGerald , Shankar Ananthakrishnan , Konstantine Arkoudas , Davide Bernardi , Abhishek Bhagia , Claudio Delli Bovi , Jin Cao , Rakesh Chada , Amit Chauhan , Luoxin Chen

分类：自然语言处理 | 人工智能 | 机器学习

2022-06-15

我们介绍了一个大规模实验，该实验对编码器进行了预处理，其参数计数范围从700m到9.3b不等，随后蒸馏到较小的型号中，范围为17m-170亿参数，其应用到自然语言理解（NLU）组件（NLU）组件（虚拟助手系统。尽管我们使用70％的口语数据训练，但在对书面形式的跨语性自然语言推论（XNLI）语料库进行评估时，我们的教师模型与XLM-R和MT5相当。我们使用系统中的内域数据对教师模型进行了第二阶段的训练，以提高了3.86％的相对分类，而相对7.01％的插槽填充。我们发现，即使是从我们的2阶段教师模型中提取的170亿参数模型，与仅接受公共数据的2.3B参数老师相比，与2.3B参数老师相比，意图分类更好2.88％，并且7.69％的插槽填充错误率更好（第1阶段），强调了。内域数据对训练的重要性。当使用标记的NLU数据进行离线评估时，我们的17m参数阶段2蒸馏模型的表现分别优于XLM-R碱基（85m Params）和Distillbert（42m Params），分别优于4.23％至6.14％。最后，我们介绍了一个完整的虚拟助手实验平台的结果，在该平台中，我们发现使用经过预训练和蒸馏管道训练的模型超过了从8500万参数教师蒸馏的模型，在自动测量全系统用户不满的自动测量中，从8500万参数教师蒸馏出3.74％-4.91％。

translated by 谷歌翻译

An Empirical Study on Activity Recognition in Long Surgical Videos

Zhuohong He , Ali Mottaghi , Aidean Sharghi , Muhammad Abdullah Jamal , Omid Mohareri

分类：计算机视觉

2022-05-05

手术视频中的活动识别是开发下一代设备和工作流程监测系统的关键研究领域。由于手术是具有高度变化长度的较长过程，因此用于手术视频的深度学习模型通常包括使用主链和时间序列模型的两阶段设置。在本文中，我们研究了许多最新的骨干和时间模型，以找到为手术活动识别提供最强性能的体系结构。我们首先在大规模活动识别数据集上进行模型性能，该数据集包含在多个临床手术室中捕获的800多个手术视频。我们进一步评估了两个较小的公共数据集（Cholec80和Cataract-101数据集）上的模型，分别包含80个视频和101个视频。我们从经验上发现，Swin-Transformer+BigRU时间模型在两个数据集上都产生了强劲的性能。最后，我们通过对新医院进行微调模型来研究模型对新领域的适应性，并试验最近无监督的域适应方法。

translated by 谷歌翻译

REFUGE2 Challenge: A Treasure Trove for Multi-Dimension Analysis and Evaluation in Glaucoma Screening

Huihui Fang , Fei Li , Junde Wu , Huazhu Fu , Xu Sun , Jaemin Son , Shuang Yu , Menglu Zhang , Chenglang Yuan , Cheng Bian

分类：计算机视觉

2022-02-18

With the rapid development of artificial intelligence (AI) in medical image processing, deep learning in color fundus photography (CFP) analysis is also evolving. Although there are some open-source, labeled datasets of CFPs in the ophthalmology community, large-scale datasets for screening only have labels of disease categories, and datasets with annotations of fundus structures are usually small in size. In addition, labeling standards are not uniform across datasets, and there is no clear information on the acquisition device. Here we release a multi-annotation, multi-quality, and multi-device color fundus image dataset for glaucoma analysis on an original challenge -- Retinal Fundus Glaucoma Challenge 2nd Edition (REFUGE2). The REFUGE2 dataset contains 2000 color fundus images with annotations of glaucoma classification, optic disc/cup segmentation, as well as fovea localization. Meanwhile, the REFUGE2 challenge sets three sub-tasks of automatic glaucoma diagnosis and fundus structure analysis and provides an online evaluation framework. Based on the characteristics of multi-device and multi-quality data, some methods with strong generalizations are provided in the challenge to make the predictions more robust. This shows that REFUGE2 brings attention to the characteristics of real-world multi-domain data, bridging the gap between scientific research and clinical application.

translated by 谷歌翻译